Goto

Collaborating Authors

 lottery ticket hypothesis





8b2fc235787852ead92da2268cd9e90c-Paper-Conference.pdf

Neural Information Processing Systems

In recent years, deep learning has become a staple solution to different tasks, such as computer vision,bio-informatics,speechrecognition,andmanymore.





On the Sparsity of the Strong Lottery Ticket Hypothesis

Neural Information Processing Systems

Considerable research efforts have recently been made to show that a random neural network $N$ contains subnetworks capable of accurately approximating any given neural network that is sufficiently smaller than $N$, without any training. This line of research, known as the Strong Lottery Ticket Hypothesis (SLTH), was originally motivated by the weaker Lottery Ticket Hypothesis, which states that a sufficiently large random neural network $N$ contains sparse subnetworks that can be trained efficiently to achieve performance comparable to that of training the entire network $N$.Despite its original motivation, results on the SLTH have so far not provided any guarantee on the size of subnetworks.Such limitation is due to the nature of the main technical tool leveraged by these results, the Random Subset Sum (RSS) Problem.Informally, the RSS Problem asks how large a random i.i.d.


Validating the Lottery Ticket Hypothesis with Inertial Manifold Theory

Neural Information Processing Systems

Despite achieving remarkable efficiency, traditional network pruning techniques often follow manually-crafted heuristics to generate pruned sparse networks. Such heuristic pruning strategies are hard to guarantee that the pruned networks achieve test accuracy comparable to the original dense ones. Recent works have empirically identified and verified the Lottery Ticket Hypothesis (LTH): a randomly-initialized dense neural network contains an extremely sparse subnetwork, which can be trained to achieve similar accuracy to the former. Due to the lack of theoretical evidence, they often need to run multiple rounds of expensive training and pruning over the original large networks to discover the sparse subnetworks with low accuracy loss.


The Lottery Ticket Hypothesis for Pre-trained BERT Networks

Neural Information Processing Systems

In natural language processing (NLP), enormous pre-trained models like BERT have become the standard starting point for training on a range of downstream tasks, and similar trends are emerging in other areas of deep learning. In parallel, work on the lottery ticket hypothesis has shown that models for NLP and computer vision contain smaller matching subnetworks capable of training in isolation to full accuracy and transferring to other tasks. In this work, we combine these observations to assess whether such trainable, transferrable subnetworks exist in pre-trained BERT models. For a range of downstream tasks, we indeed find matching subnetworks at 40% to 90% sparsity. We find these subnetworks at (pre-trained) initialization, a deviation from prior NLP research where they emerge only after some amount of training. Subnetworks found on the masked language modeling task (the same task used to pre-train the model) transfer universally; those found on other tasks transfer in a limited fashion if at all. As large-scale pre-training becomes an increasingly central paradigm in deep learning, our results demonstrate that the main lottery ticket observations remain relevant in this context.